Skip to content

Conversation

@Diolor
Copy link
Collaborator

@Diolor Diolor commented Sep 26, 2025

This PR closes #2942

Description

Port MASTG-TEST-0004: Sensitive Data Leaked via Embedded Libraries (android)

As I could not create a generic Semgrep rule for all kinds of libraries, the demo uses Firebase Analytics as an example.


TODOs before merging:

@Diolor Diolor self-assigned this Sep 26, 2025
@Diolor Diolor marked this pull request as ready for review October 2, 2025 08:10
@Diolor Diolor added the Android label Oct 6, 2025
@Diolor Diolor changed the title Port MASTG-TEST-0004: Sensitive Data Leaked via Embedded Libraries (android) Port MASTG-TEST-0004: App Exposing Sensitive Data to Embedded Libraries Oct 6, 2025
@cpholguera
Copy link
Collaborator

A couple of things are going on here:

FIrst of all just to clarify scope. The weaknesses we should be targeting are:

  • "MASWE-0112: Inadequate Data Collection Declarations"
  • "MASWE-0111: Inadequate Privacy Policy"
  • We also have "MASWE-0108: Sensitive Data in Network Traffic," but this should be deprecated as it is covered by the two above. In the end, the weakness isn't that the sensitive data can be found (encrypted!) in network traffic. It's only an issue if that's not properly declared.
  • Note that for sensitive data in cleartext connections, we have MASWE-0050.

Test

The original test has several parts.

Part 1

To determine whether API calls and functions provided by the third-party library are used according to best practices, review their source code, requested permissions, and check for any known vulnerabilities.

This isn't good, and parts of it actually belong somewhere else.

Your new test addresses the "identification of potentially sensitive data that may have been passed to embedded third-party libraries used by the application."

TODO:

  • Let's keep this test but only its "Method 2," point 2.
    • title: App Exposing Sensitive Data to Embedded Libraries -> Update to something like "References to SDK APIs Known to Handle Sensitive Data"
  • Let's open an issue to create a new test that uses "Method 2," point 1.
    • title: Use of Third-Party Tracking & Analytics SDKs (something like this)
  • Let's open an issue to create a new test that uses "Method 1" with MASTG-TECH-0119: Intercepting HTTP Traffic by Hooking Network APIs at the Application Layer

Part 2

All data that's sent to third-party services should be anonymized to prevent exposure of PII (Personally Identifiable Information) that would allow the third party to identify the user account. No other data (such as IDs that can be mapped to a user account or session) should be sent to a third party.

This seems to be related to MASWE-0109: Lack of Anonymization or Pseudonymisation Measures.

TODO:

  • Let's open an issue for this part, which will require additional research work.

Part 3

Check all requests to external services for embedded sensitive information. [...]

This is MASTG-TEST-0206: Undeclared PII in Network Traffic Capture and the new suggested test that uses MASTG-TECH-0119 ("Method 1" in your current test).

Demo

The proposed demo is a bit misleading. When analyzing the issue statically, you won't find the sensitive user data (e.g., email, name, username) in the results. We would need to make this more realistic.

  • You can only find the use of third-party SDK APIs that POTENTIALLY will hold sensitive data, e.g., in your example from Firebase (eventBundle.putString, analytics.logEvent, analytics.setUserId, etc.).
  • The test should explicitly say that only calls to those methods coming from the main app files and modules are valid. Otherwise, you'll have false positives just because the app contains the SDK, which contains those methods.

TODO: update demo title to "Uses of Firebase Analytics APIs on Potential PII with Semgrep"

New Demo

We need a dynamic demo and test: we hook all those APIs and will find out which ones are actually used and what they contain.

We can then correlate that with the hooks to the network APIs and the traffic capture.

For the data we detect using these tests, the final questions are:

  • Does it lack anonymization or pseudonymisation (MASWE-0109)?
  • Is it properly declared/disclosed to the app user? (MASWE-0111, MASWE-0112)

Putting it all together

Now we have a pretty solid test strategy:

Static:

  • We use static to know which Tracking & Analytics SDKs the app uses (new separate test).
  • We use the previous info and use static again to check for use of APIs (this PR's test).

Dynamic:

  • We use dynamic on the discovered APIs to learn which ones are used and what sensitive data they handle (new separate test).
  • We use dynamic on networking APIs to see what sensitive data is sent over the network and by which SDK (new separate test).

Network:

  • We use network analysis (traffic capture-based) as proof for the actual transmission (new separate test).

Summary of ## Required Actions for This PR

  1. Limit the current test to only “Method 2, point 2.”

    • Update the test title to: “References to SDK APIs Known to Handle Sensitive Data.”
  2. Clarify the scope in the demo section.

    • Update the demo title to: “Uses of Firebase Analytics APIs on Potential PII with Semgrep.”
    • Add a note specifying that only calls from the app’s own files/modules are considered valid (to avoid false positives from SDK code).
  3. Remove or relocate content that belongs in other tests.

    • Remove any parts related to:
      • identifying third-party SDKs,
      • anonymization requirements,
      • network traffic analysis.
    • Keep the PR focused strictly on detecting potential sensitive-data-handling SDK API usage via static analysis.
  4. Clarify the limitations of static analysis in this test.

    • Add text stating that static analysis detects only potential sensitive data handling, not actual user data.

Separate Follow-Up Issues (not part of this PR)

Create issues for:

  1. A new test based on Method 2, point 1:
    “Use of Third-Party Tracking & Analytics SDKs.”
  2. A new test based on Method 1 using MASTG-TECH-0119 (network API hooking).
  3. A new test for anonymization/pseudonymisation (MASWE-0109).
  4. A new dynamic test for discovering actual sensitive data passed to SDKs.
  5. A new network-analysis test for verifying transmission of sensitive data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MASTG v1->v2 MASTG-TEST-0004: Determining Whether Sensitive Data Is Shared with Third Parties via Embedded Services (android)

2 participants